Add asynchronous concurrent execution #3687

matyas-streamhpc · 2024-11-25T10:27:58Z

No description provided.

WIP WIP WIP WIP WIP

randyh62

left comments. Looks good overall.

randyh62 · 2024-12-11T17:45:32Z

docs/how-to/hip_runtime_api/asynchronous.rst

+Asynchronous concurrent execution
+*******************************************************************************
+
+Asynchronous concurrent execution important for efficient parallelism and


Suggested change

Asynchronous concurrent execution important for efficient parallelism and

Asynchronous concurrent execution is important for efficient parallelism and

randyh62 · 2024-12-11T17:46:34Z

docs/how-to/hip_runtime_api/asynchronous.rst

+Asynchronous concurrent execution important for efficient parallelism and
+resource utilization, with techniques such as overlapping computation and data
+transfer, managing concurrent kernel execution with streams on single or
+multiple devices or using HIP graphs.


Suggested change

multiple devices or using HIP graphs.

multiple devices, or using HIP graphs.

randyh62 · 2024-12-11T17:55:55Z

docs/how-to/hip_runtime_api/asynchronous.rst

+data allocation/freeing all happen in the context of device streams.
+
+Streams are FIFO buffers of commands to execute in order on a given device.
+Commands which enqueue tasks on a stream all return promptly and the command is


Suggested change

Commands which enqueue tasks on a stream all return promptly and the command is

Commands which enqueue tasks on a stream all return promptly and the task is

randyh62 · 2024-12-11T17:56:51Z

docs/how-to/hip_runtime_api/asynchronous.rst

+
+Streams are FIFO buffers of commands to execute in order on a given device.
+Commands which enqueue tasks on a stream all return promptly and the command is
+executed asynchronously. Multiple streams may point to the same device and


Suggested change

executed asynchronously. Multiple streams may point to the same device and

executed asynchronously. Multiple streams can point to the same device and

randyh62 · 2024-12-11T17:57:12Z

docs/how-to/hip_runtime_api/asynchronous.rst

+Commands which enqueue tasks on a stream all return promptly and the command is
+executed asynchronously. Multiple streams may point to the same device and
+those streams may be fed from multiple concurrent host-side threads. Execution
+on multiple streams may be concurrent but isn't required to be.


Suggested change

on multiple streams may be concurrent but isn't required to be.

on multiple streams might be concurrent but isn't required to be.

randyh62 · 2024-12-11T18:23:23Z

docs/how-to/hip_runtime_api/asynchronous.rst

+contention for shared resources. This is because multiple kernels may attempt
+to access the same GPU resources simultaneously, leading to delays.
+
+Asynchronous kernel execution is beneficial only under specific conditions It


Suggested change

Asynchronous kernel execution is beneficial only under specific conditions It

Asynchronous kernel execution is beneficial only under specific conditions. It

randyh62 · 2024-12-11T18:27:58Z

docs/how-to/hip_runtime_api/asynchronous.rst

+or from the GPU concurrently with kernel execution. Applications can query this
+capability by checking the ``asyncEngineCount`` device property. Devices with
+an ``asyncEngineCount`` greater than zero support concurrent data transfers.
+Additionally, if host memory is involved in the copy, it should be page-locked


Is there a reference we can provide such as Memory Management or something?

randyh62 · 2024-12-11T18:40:27Z

docs/how-to/hip_runtime_api/asynchronous.rst

+
+It is also possible to perform intra-device copies simultaneously with kernel
+execution on devices that support the ``concurrentKernels`` device property
+and/or with copies to or from the device (for devices that support the


Suggested change

and/or with copies to or from the device (for devices that support the

and/or with copies to or from the device (for devices that support the

Are copies to or from the device intra-device copies?

randyh62 · 2024-12-11T18:43:02Z

docs/how-to/hip_runtime_api/asynchronous.rst

+called, control is not returned to the host thread before the device has
+completed the requested task. The behavior of the host thread—whether to yield,
+block, or spin—can be specified using :cpp:func:`hipSetDeviceFlags` with
+specific flags. Understanding when to use synchronous calls is important for


Suggested change

specific flags. Understanding when to use synchronous calls is important for

appropriate flags. Understanding when to use synchronous calls is important for

randyh62 · 2024-12-11T18:57:59Z

docs/how-to/hip_runtime_api/asynchronous.rst

+By creating an event with :cpp:func:`hipEventCreate` and recording it with
+:cpp:func:`hipEventRecord`, developers can synchronize operations across
+streams, ensuring correct task execution order. :cpp:func:`hipEventSynchronize`
+allows waiting for an event to complete before proceeding with the next


Suggested change

allows waiting for an event to complete before proceeding with the next

lets the application wait for an event to complete before proceeding with the next

randyh62 · 2024-12-11T23:21:53Z

docs/how-to/hip_runtime_api/asynchronous.rst

+sequences of kernels and memory operations as a single graph, they simplify
+complex workflows and enhance performance, particularly for applications with
+intricate dependencies and multiple execution stages.
+


Suggested change

matyas-streamhpc requested a review from neon60 November 25, 2024 10:27

matyas-streamhpc self-assigned this Nov 25, 2024

neon60 force-pushed the docs/develop branch from d5aedf9 to dcc6faa Compare November 28, 2024 22:33

neon60 force-pushed the async-doc branch 2 times, most recently from 1484d67 to f81588d Compare December 2, 2024 08:46

neon60 marked this pull request as ready for review December 2, 2024 08:53

neon60 requested review from chrispaquot, gandryey, saleelk, mangupta and rakesroy as code owners December 2, 2024 08:53

neon60 force-pushed the docs/develop branch from dcc6faa to 8d396e4 Compare December 4, 2024 12:37

neon60 force-pushed the async-doc branch from f927cd0 to 01036d5 Compare December 4, 2024 12:43

neon60 force-pushed the docs/develop branch from 1026037 to f59066d Compare December 5, 2024 12:20

Add asynchronous execution

bc29d44

WIP WIP WIP WIP WIP

neon60 force-pushed the async-doc branch 3 times, most recently from a8fd499 to fd5af51 Compare December 6, 2024 18:10

Rewrite some sections + internal review

6a139c6

neon60 force-pushed the async-doc branch from fd5af51 to 6a139c6 Compare December 6, 2024 18:18

WIP

21e7cf9

randyh62 approved these changes Dec 11, 2024

View reviewed changes

randyh62 reviewed Dec 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add asynchronous concurrent execution #3687

Add asynchronous concurrent execution #3687

matyas-streamhpc commented Nov 25, 2024

randyh62 left a comment

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

randyh62 Dec 11, 2024

	Asynchronous concurrent execution important for efficient parallelism and
	Asynchronous concurrent execution is important for efficient parallelism and

	multiple devices or using HIP graphs.
	multiple devices, or using HIP graphs.

	Commands which enqueue tasks on a stream all return promptly and the command is
	Commands which enqueue tasks on a stream all return promptly and the task is

	executed asynchronously. Multiple streams may point to the same device and
	executed asynchronously. Multiple streams can point to the same device and

	on multiple streams may be concurrent but isn't required to be.
	on multiple streams might be concurrent but isn't required to be.

	Asynchronous kernel execution is beneficial only under specific conditions It
	Asynchronous kernel execution is beneficial only under specific conditions. It

	and/or with copies to or from the device (for devices that support the
	and/or with copies to or from the device (for devices that support the

	specific flags. Understanding when to use synchronous calls is important for
	appropriate flags. Understanding when to use synchronous calls is important for

	allows waiting for an event to complete before proceeding with the next
	lets the application wait for an event to complete before proceeding with the next

Add asynchronous concurrent execution #3687

Are you sure you want to change the base?

Add asynchronous concurrent execution #3687

Conversation

matyas-streamhpc commented Nov 25, 2024

randyh62 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment